Trade& Ahead

Background:

Objective:

Data Dictionary

  1. Ticker Symbol: An abbreviation used to uniquely identify publicly traded shares of a particular stock on a particular stock market
  2. Company: Name of the company
  3. GICS Sector: The specific economic sector assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations
  4. GICS Sub Industry: The specific sub-industry group assigned to a company by the Global Industry Classification Standard (GICS) that best defines its business operations
  5. Current Price: Current stock price in dollars
  6. Price Change: Percentage change in the stock price in 13 weeks
  7. Volatility: Standard deviation of the stock price over the past 13 weeks
  8. ROE: A measure of financial performance calculated by dividing net income by shareholders' equity (shareholders' equity is equal to a company's assets minus its debt)
  9. Cash Ratio: The ratio of a company's total reserves of cash and cash equivalents to its total current liabilities
  10. Net Cash Flow: The difference between a company's cash inflows and outflows (in dollars)
  11. Net Income: Revenues minus expenses, interest, and taxes (in dollars)
  12. Earnings Per Share: Company's net profit divided by the number of common shares it has outstanding (in dollars)
  13. Estimated Shares Outstanding: Company's stock currently held by all its shareholders
  14. P/E Ratio: Ratio of the company's current stock price to the earnings per share
  15. P/B Ratio: Ratio of the company's stock price per share by its book value per share (book value of a company is the net difference between that company's total assets and total liabilities)

Importing necessary libraries and data

Data Overview

Let's load the data

Also, take a backup of the data

Let's check the head and tail of the data

Observations:

Shape of the data

Observations:

Information of the dataset

Observations:

Missing value check

Observations:

Duplicate value check

Observations:

Exploratory Data Analysis (EDA)

Questions:

  1. What does the distribution of stock prices look like?
  2. The stocks of which economic sector have seen the maximum price increase on average?
  3. How are the different variables correlated with each other?
  4. Cash ratio provides a measure of a company's ability to cover its short-term obligations using only cash and cash equivalents. How does the average cash ratio vary across economic sectors?
  5. P/E ratios can help determine the relative value of a company's shares as they signify the amount of money an investor is willing to invest in a single share of a company per dollar of its earnings. How does the P/E ratio vary, on average, across economic sectors?

Checking the summary of the data

Observations:

Summary of data of categorical variables

Observations:

Univariate Analysis

First let's create a function which will plot the boxplot and histogram for variables and labeled bar plot for categorical variables

Checking the numerical columns and their distributions

Observations:

Checking the proportions in GICS Sector

Observations:

Checking GICS Sub Industry

Observations:

Bivariate Analysis

Let's plot the pairplot and try to identify any relationship among variables

Observations:

Let's also check the heatmap of correlation of the variables

Observations:

Checking Price Change vs GICS Sector

Observations:

Checking Cash Ratio vs Economic Sector

Observations:

Checking P/E ratio vs GICS Sector

Observations:

Checking estimated shares outstanding by GICS sector

Observations:

Data Preprocessing

Observations:

Let's take a copy of the dataset where we will do outliers check and treatment

Let's plot boxplots to check the outliers

Observations:

Let's write a function which will treat the outliers

Treating outliers

Let's check if the outliers are treated

Observations:

Scaling

Observations:

Let's also create a scaled dataframe from the dataset we have treated the outliers

Observations:

EDA

Checking that the distribution of the variables haven't changed

Observations:

Let's also check the pairplot for the variables

Observations:

Let's also check the corelation heatmap

Observations:

Exploring K-means Clustering

Let's find out the optimal K value for K-means clustering

Observations:

Let's find the optimal value of K in the dataset with outliers removed

Observations:

Let's check Silhoutte score for both the original and outliers removed datasets

For original dataset

For outliers removed dataset

Observations:

Visualizing Silhoutte score of original scaled dataset

For K value 7

For K value 6

For K value 5

Observations:

Visualizing Silhoutte score for outliers removed dataset

For K value 7

For K value 6

For K value 5

Observations:

From all above exploration we can finalize K means is performing better with optimal value of K as 5 on the outliers removed scaled dataset

Fitting K means on outliers removed dataset with K as 5

Adding cluster labels to original dataset

Creating cluster profile with K means clusters

Adding counts for each segment

Display cluster profiles

Visualizing the clusters

Plotting with the scaled data which will help us visualize better

Cluster Profiling - Insights of clusters obtained from K means

Hierarchical Clustering

Let's first explore which kind of distance and linkage metrics performing well on our scaled dataset

Observations:

Let's explore different linkage methods with Euclidean distance

Observations:

Let's plot the dendogram of different linkage methods

Observations:

Let's apply the Agglomerative clustering

Adding the clunster labels to the dataset

Cluster Profiling

Observations:

Exploring 'Ward' linkage method

Observations:

Display Cluster profile

Visualizing the clusters

Cluster Profiling

K-means vs Hierarchical Clustering

Actionable Insights and Recommendations